Skip to content

feat: add langchain deepagents backend integration#1

Open
johannhartmann wants to merge 8 commits intomainfrom
feat-langchain-deepagents-backend
Open

feat: add langchain deepagents backend integration#1
johannhartmann wants to merge 8 commits intomainfrom
feat-langchain-deepagents-backend

Conversation

@johannhartmann
Copy link
Copy Markdown
Member

  • What: Adds langchain-agent-sandbox integration package for LangChain DeepAgents, including adapter implementation, tests, and docs.

    • Why: Keeps LangChain/DeepAgents dependencies optional while enabling DeepAgents tooling for agent-sandbox users.
    • How:
      • Implements SandboxBackendProtocol with protocol-compliant error mapping.
      • Adds unit tests + gated e2e test (via LANGCHAIN_* env vars).
      • Adds agentic_sandbox[langchain] extra and README usage snippet.

    Testing

    • ruff check …
    • mypy (root mypy.ini)
    • bandit -r …
    • make test-langchain (optional, not run on CI unless configured)
    • Optional e2e: python -m pytest test/e2e/clients/python/test_e2e_langchain_backend.py with LANGCHAIN_* vars

@johannhartmann johannhartmann force-pushed the feat-langchain-deepagents-backend branch 2 times, most recently from a943354 to d750e34 Compare January 25, 2026 18:41
@johannhartmann johannhartmann force-pushed the feat-langchain-deepagents-backend branch 3 times, most recently from a52322f to d3df291 Compare February 7, 2026 13:47
@johannhartmann johannhartmann force-pushed the feat-langchain-deepagents-backend branch 2 times, most recently from 26dd56c to 015ad58 Compare February 11, 2026 09:22
@johannhartmann johannhartmann force-pushed the feat-langchain-deepagents-backend branch from f202be6 to f7f6921 Compare February 19, 2026 18:10
@johannhartmann johannhartmann force-pushed the feat-langchain-deepagents-backend branch 3 times, most recently from daeb330 to 4bdb0c4 Compare March 13, 2026 07:37
@johannhartmann johannhartmann force-pushed the feat-langchain-deepagents-backend branch from 4bdb0c4 to 09e0021 Compare March 24, 2026 12:52
@johannhartmann johannhartmann force-pushed the feat-langchain-deepagents-backend branch 2 times, most recently from 7b91eeb to 6561716 Compare April 7, 2026 16:50
xiaoj655 and others added 7 commits April 7, 2026 23:17
…uter (kubernetes-sigs#371)

* fix: preserve query string when proxying requests in sandbox router

* test: add unit test

* chore

* trigger ci
…kubernetes-sigs#511)

* Updates deploy-to-kube script to give the options of deploying the controller with extensions installed

* Updates the Makefile to use the new flag

* Uses metadata name to identify controller instead of file name

* Adds controller as a variable to deploy-kind

* Enables extensions flag in the CI test suite
…kubernetes-sigs#531)

* feat(python): include spec.lifecycle in SandboxClaim at creation time

Add shutdown_after_seconds parameter to create_sandbox() so claims
are expire-safe from birth. Previously, setting a TTL required a
separate PATCH after creation, leaving a vulnerability window where
a client crash could orphan claims with no expiration.

The new keyword-only parameter computes a UTC shutdown time and
includes spec.lifecycle (shutdownTime + shutdownPolicy: Delete) in
the initial manifest. Validation rejects non-int, non-positive, and
overflow values. Shared build_lifecycle() utility in lifecycle.py
avoids drift between sync and async clients.

No controller or CRD changes needed — the lifecycle field already
exists and is read on every reconcile.

Made-with: Cursor

* test: add integration test for lifecycle-at-creation code path

Exercises the full path from create_sandbox(shutdown_after_seconds=N)
through build_lifecycle(), _create_claim(), and K8sHelper down to the
manifest body passed to the K8s API — only the API transport is mocked.

Validates:
- lifecycle dict appears in spec when shutdown_after_seconds is set
- shutdownTime falls in the expected UTC window
- shutdownPolicy is "Delete"
- no lifecycle when shutdown_after_seconds is omitted
- validation rejects invalid input before any K8s API call

Made-with: Cursor

* fix: address PR review feedback

- Rename build_lifecycle -> construct_sandbox_claim_lifecycle_spec,
  move from lifecycle.py to utils.py
- Add docstrings for shutdown_after_seconds on both sync and async
  create_sandbox()
- Add OTel span attributes for lifecycle shutdown_time and
  shutdown_policy in both sync and async _create_claim()
- Strengthen return type hint to dict[str, str]
- Simplify type check to `type(x) is not int`
- Simplify lifecycle extraction in test (use keyword arg directly)
- Remove unused mock_datetime.side_effect in test
- Move timedelta import to top of integration test file

Made-with: Cursor
…tes-sigs#347)

* Enable sandboxwarmpool on template updates

* Fetch template once

* Fix lint

* Added additional test case checks

* Add UpdateStrategy

* fix:lint

* Address comments

* Update isSandboxStale

* fix: ut

* Add additional tc

* Address comments

* fix ut

* Remove Semantic Equality check

* Revert "Remove Semantic Equality check"

This reverts commit 33d32d2.

* Add semantic check

* Add check in sandboxclaim

* Check staleness for orphaned sandboxes

* Remove check in sandboxclaim controller

* nit
Adds clients/python/langchain-agent-sandbox, a Python
SandboxBackendProtocol implementation from deepagents (>=0.5.0)
that wraps a kubernetes-sigs/agent-sandbox Sandbox handle. An
agent running through this backend executes shell commands and
file operations inside a managed sandbox pod rather than on the
host, while presenting the same contract as any other deepagents
backend.

## Package contents

clients/python/langchain-agent-sandbox/:
- langchain_agent_sandbox/backend.py  AgentSandboxBackend class
  implementing every SandboxBackendProtocol method (execute, ls,
  read, write, edit, grep, glob, upload_files, download_files,
  plus all async variants). Also exports:
  - from_template() factory for lifecycle-managed sandboxes via
    direct / gateway / tunnel connection modes
  - SandboxPolicyWrapper (deny_prefixes, deny_commands,
    audit_log) for policy enforcement
  - WarmPoolBackend for warmpool-adopted sandboxes
  - create_sandbox_backend_factory() helper for
    `create_deep_agent(backend=...)`
- langchain_agent_sandbox/__init__.py  public exports
- tests/test_backend.py  88 unit tests using a StubSandbox
  (SimpleNamespace + Mock), covering the protocol surface, path
  virtualization, policy wrapper, warm pool, and every fix below
- pyproject.toml  deepagents>=0.5.0 + k8s-agent-sandbox
- README.md, uv.lock

examples/langchain-deepagents/:
- main.py  minimal end-to-end example that runs a deepagents
  agent against a provisioned sandbox
- sandbox-template.yaml  a SandboxTemplate the example claims
  from
- README.md + run-test-kind.sh  kind workflow walkthrough
- .deepagents/skills/*  example skill files

test/e2e/clients/python/test_e2e_langchain_backend.py: env-gated
kind integration test that exercises execute, write, read, edit,
grep, glob, upload_files, and download_files against a real
sandbox pod. Skips silently when LANGCHAIN_SANDBOX_TEMPLATE is
unset.

## Repo integration

- Makefile: `test-langchain` target runs the unit suite
  (`uv run pytest clients/python/langchain-agent-sandbox/tests/
   -v --junitxml=bin/langchain-backend-junit.xml`).
- dev/tools/test-e2e: `setup_python_sdk` pip-installs
  `langchain-agent-sandbox[test]` if the directory is present,
  and `run_python_e2e_tests` discovers the e2e test through the
  standard pytest invocation on `test/e2e/`.
- test/e2e/README.md: documents the `LANGCHAIN_SANDBOX_TEMPLATE`,
  `LANGCHAIN_NAMESPACE`, `LANGCHAIN_GATEWAY_NAME`,
  `LANGCHAIN_API_URL`, `LANGCHAIN_USE_TUNNEL`,
  `LANGCHAIN_SERVER_PORT`, and `LANGCHAIN_ROOT_DIR` env vars.
- examples/README.md: links to the new example.

## deepagents 0.5.x protocol compliance

deepagents 0.5.0 renamed the backend protocol method set and
replaced plain returns with typed result dataclasses. This
backend targets the new API from the start:

- ls_info -> ls returning LsResult
- grep_raw -> grep returning GrepResult
- glob_info -> glob returning GlobResult
- read returning ReadResult(file_data=FileData(content=...,
  encoding="utf-8")) with raw content (the middleware handles
  line numbering via format_content_with_line_numbers, so the
  backend returns unformatted output)
- WriteResult / EditResult constructed without the deprecated
  `files_update` kwarg (explicit None emits a DeprecationWarning
  in 0.5.x)
- execute / aexecute accept a keyword-only `timeout: Optional[int]
  = None` matching the new SandboxBackendProtocol signature

## Error-handling hardening

All error paths are surfaced through the typed result fields so
the deepagents middleware can react without losing context:

- ls / grep / glob: sandbox-side command invocation is wrapped in
  try/except and exceptions surface via
  Result(error="..."). On `exit_code != 0` the stderr is
  propagated into the error field alongside an empty entries/
  matches list rather than a stale stdout.
- read / edit: strict utf-8 decode (no `errors="replace"`) so
  non-UTF-8 files report a typed error instead of silently
  producing lossy content labelled as utf-8.
- read: empty files return empty content regardless of offset;
  offset >= len(lines) on a non-empty file returns
  ReadResult(error="Line offset N exceeds file length...").
- execute: distinguishes TimeoutError (exit_code=-2, output
  prefixed with "Timed out") from other failures (exit_code=-1,
  "Error:" prefix).

## Policy wrapper

SandboxPolicyWrapper wraps any AgentSandboxBackend and enforces
three rules at call time:
- deny_prefixes (writes / edits / uploads): path-prefix deny
  list, canonicalized so traversal-style bypasses like
  `/app/../etc` are caught
- deny_commands (execute): substring match against a deny list;
  returns ExecuteResponse with exit_code=1 and a
  "Policy denied" prefix
- audit_log: optional callback invoked with
  (operation, target, metadata) on every write / edit / execute /
  upload

Read operations pass through without checks.

## kind e2e

Running the Python e2e against a real kind cluster
(`LANGCHAIN_SANDBOX_TEMPLATE=df-standard
 LANGCHAIN_NAMESPACE=darkfactory LANGCHAIN_USE_TUNNEL=1
 KUBECONFIG=bin/KUBECONFIG`) exercises the full backend surface
against a live sandbox pod:

- execute -> shell command round-trip
- write -> /langchain_e2e.txt created with 3 lines
- read -> content reflects the write
- edit(replace_all=False) -> single-occurrence replacement
- grep -> finds matches by literal pattern
- glob("**/langchain_e2e.txt") -> matches the file at the root
  of the search path
- upload_files([("/nested/dir/extra.txt", ...)]) -> creates the
  nested directory chain on demand and uploads the payload
- download_files -> round-trips the bytes

All paths green. Four pre-existing backend bugs surfaced during
this run and are fixed in this commit:

1. grep command appended `2>&1` as a shell redirect, but the
   sandbox runtime runs commands via subprocess.run + shlex.split
   (no shell). `2>&1` became a literal grep argument, grep tried
   to open a file named `2>&1`, failed with exit 2, and the
   exit-code-based error detection flagged real matches as
   errors. Dropped the suffix; grep's stderr goes to the runtime's
   stderr channel.

2. glob's `**` support was broken.
   pathlib.PurePosixPath.match in Python 3.11 treats `**` as two
   consecutive `*` wildcards, NOT as recursive globstar, so
   `**/target.txt` failed to match `target.txt` at the root.
   Replaced the PurePath.match call with a dedicated
   `_compile_glob` helper that translates the pattern to a regex
   with proper `**` handling (zero-or-more path components).
   Patterns without any `/` fall back to basename-only matching
   so `glob("*.py")` still means "any .py in the tree".

3. upload_files refused paths with missing parent directories,
   returning `error="invalid_path"` instead of creating the
   parent chain on demand. write() already calls
   `_ensure_parent_dir` (mkdir -p) before uploading, so the two
   write APIs were inconsistent. upload_files now calls
   `_ensure_parent_dir` when parent_state is "missing".

4. test_e2e_langchain_backend.py used a stale
   `SandboxClient(template_name=..., namespace=..., gateway_name=
    ..., ...)` constructor signature that the upstream
   k8s_agent_sandbox.SandboxClient no longer accepts. Switched to
   `AgentSandboxBackend.from_template()` which wires the current
   SandboxClient API internally and presents the same option
   set.

## Test results

- 88 unit tests pass under `-W error::DeprecationWarning`
- Python e2e passes end-to-end against the real kind cluster
  (test_langchain_backend_basic)
- No `any` / untyped leak points, type annotations throughout
- Apache-2.0 headers on every new file
@johannhartmann johannhartmann force-pushed the feat-langchain-deepagents-backend branch 2 times, most recently from 2012af5 to 40e1009 Compare April 8, 2026 13:50
Eleven fixes layered on the existing PR. 123 unit tests pass; kind
e2e against a real cluster still passes. No public-API breakage.

Critical:
- execute() timeout detection was dead code: the SDK wraps
  requests.exceptions.Timeout into SandboxRequestError via
  `raise ... from e`, never matching `except TimeoutError`. New
  _is_timeout_exception() walks __cause__/__context__ and detects
  builtin TimeoutError, requests/httpx Timeout, plus a duck-typed
  name fallback for future SDK exceptions that don't chain.
- _compile_glob middle-`**` matched bare `ab` for `a/**/b`. Rewrote
  the translator to handle leading/middle/trailing `**` distinctly;
  trailing follows the gitignore semantic (`a/**` rejects bare `a`).
- create_sandbox_backend_factory returned an un-entered backend, so
  the first call AttributeError'd. Now eagerly enters and registers
  a weakref.finalize for GC/shutdown teardown. The handle is exposed
  as backend._finalizer for deterministic test invocation.

Important:
- __exit__ silently swallowed delete failures. Now re-raises on
  the happy path; on user-exception unwind raises BaseExceptionGroup
  so neither the user error nor the leak signal is lost.
- _factory_atexit_cleanup blanket-swallowed every error. Now filters
  only HTTP 404 (the redundant-cleanup case) and logs everything
  else at ERROR with a shutdown-safe inner guard.
- SandboxPolicyWrapper gained `strict_audit: bool = False` (kw-only).
  When True, audit-callback failures refuse the operation; deny
  detail propagates through write/edit/upload result fields instead
  of flattening to "policy_denied". Refactored four inline audit
  blocks into a single _emit_audit() helper. Default unchanged.
- _emit_audit log lines now include operation, target, metadata, and
  exc_info=True for SRE diagnosability.
- glob() catches non-re.error compilation failures (IndexError,
  TypeError, RecursionError) and returns a typed GlobResult.
- grep() error detail reads stderr first (with stdout/exit-code
  fallbacks) instead of always-empty stdout.
- SandboxPolicyWrapper docstring rewritten to drop "enterprise-grade
  restrictions" — it's an application-layer guardrail, not a
  security boundary. The runtime (gVisor/Kata) is the boundary.

Tests: 88 -> 123. New coverage for _compile_glob branches (10
tests), wrapped-timeout detection (real ReadTimeout, not a mock),
factory eager-enter and finalizer (404 vs non-404), __exit__
ExceptionGroup contract, strict_audit on execute/write/edit/upload,
parametrized fail-open coverage for write/edit/upload, and grep
stderr error detail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants